Speaker normalized acoustic modeling based on 3-D Viterbi decoding

نویسندگان

  • Toshiaki Fukada
  • Yoshinori Sagisaka
چکیده

This paper describes a novel method for speaker normalization based on a frequency warping approach to reduce variations due to speaker-induced factors such as the vocal tract length. In our approach, a speaker normalized acoustic model is trained using time-varying (i.e., state, phoneme or word dependent) warping factors, while in the conventional approaches, the frequency warping factor is xed for each speaker. These time-varying frequency warping factors are determined by a 3-dimensional (i.e., input frames, HMM states and warping factors) Viterbi decoding procedure. Experimental results on Japanese spontaneous speech recognition show that the proposed method yields a 9.7 % improvement in speech recognition accuracy compared to the conventional speaker-independent model.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The dynamically-adjustable histogram pruning method for embedded voice dialing

Memory and speed are two key factors that must be faced when applying voice dialer to Pocket PCs. To provide a solution, a novel decoding method integrated with the score differences of token paths is proposed, named as “Dynamically-Adjustable Histogram Pruning”. Additionally, the computation of likelihood score is accelerated by means of dynamic score lookup table. Furthermore, a new acoustic ...

متن کامل

Speaker Diarization Based on Gmm Supervectors and Unsupervised Intra-speaker Variability Modeling

This paper presents a novel framework for speaker diarization. Audio is parameterized by a sequence of GMM-supervectors representing overlapping short segments of speech. Session dependent intra-session intra-speaker variability is estimated online in an unsupervised manner, and is removed from the supervectors using Nuisance Attribute Projection (NAP) The supervectors are then projected using ...

متن کامل

Probabilistic Speaker-Class based Acoustic Modeling for Large Vocabulary Continuous Speech Recognition

In this paper, a probabilistic speaker-class (PSC) based acoustic modeling method is proposed for taking into account speaker variability influence in HMM-based speech recognition systems. Firstly, within the context of speaker-class based speech recognition, an experiment is conducted to investigate the performance of speaker-class recognition based on hard-cut speaker clustering. Then, in the...

متن کامل

Augmented state space acoustic decoding for modeling local variability in speech

This paper presents a decoding method for automatic speech recognition (ASR) that reduces the impact of local spectral and temporal variabilities on ASR performance. The procedure involves augmenting the standard Viterbi search for an optimum state sequence with a locally constrained search for optimum degrees of spectral warping or temporal warping applied to individual analysis frames. It is ...

متن کامل

A sub-optimal viterbi-like search for linear dynamic models classification

This paper describes a Viterbi-like decoding algorithm applied on segment-models based on linear dynamic systems (LDMs). LDMs are a promising acoustic modeling scheme which can alleviate several of the limitations of the popular Hidden Markov Models (HMMs). There are several implementations of LDMs that can be found in the literature. For our decoding experiments we consider general identifiabl...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998